4 research outputs found

    Advanced fuzzy matching in the translation of EU texts

    Get PDF
    In the translation industry today, CAT tool environments are an indispensable part of the translator’s workflow. Translation memory systems constitute one of the most important features contained in these tools and the question of how to best use them to make the translation process faster and more efficient legitimately arises. This research aims to examine whether there are more efficient methods of retrieving potentially useful translation suggestions than the ones currently used in TM systems. We are especially interested in investigating whether more sophisticated algorithms and the inclusion of linguistic features in the matching process lead to significant improvement in quality of the retrieved matches. The used dataset, the DGT-TM, is pre-processed and parsed, and a number of matching configurations are applied to the data structures contained in the produced parse trees. We also try to improve the matching by combining the individual metrics using a regression algorithm. The retrieved matches are then evaluated by means of automatic evaluation, based on correlations and mean scores, and human evaluation, based on correlations of the derived ranks and scores. Ultimately, the goal is to determine whether the implementation of some of these fuzzy matching metrics should be considered in the framework of the commercial CAT tools to improve the translation process

    Semantiska strukturer för adjektiv som betecknar smaker i svenskan och kroatiskan

    Get PDF
    Syftet med det här masterarbetet är att utforska ett antal grundläggande smaktermer i svenskan och kroatiskan för att jämföra och bättre förstå de lingvistiska och kognitiva mekanismer som utformar domänen smak. Som ett av de fem traditionella sinnena utgör smaken och alla dess tillhörande lexem ett lämpligt område för att pröva några av de fundamentala teserna av kognitiv lingvistik, exempelvis dem som gäller förkroppsligat språk och universella tendenser att utveckla nya, mer abstrakta betydelser från de basala, fysiska upplevelser. Därför blir det intressant att jämföra vad som händer med lexemen i två särskilda språkliga och kulturella system och betrakta hur de organiserar de semantiska strukturerna inom denna domän. En annan orsak att fokusera på domänen smak är att den inte fått så mycket uppmärksamhet inom lingvistisk forskning, särskilt i jämförelse med t.ex. den visuella eller auditiva domänen (Popova 2005: 396, Backhouse 1994: 1).This master thesis aims to examine the semantic structure of adjectives denoting the basic tastes in Swedish and Croatian. The thesis is construed as a contrastive study of both monolingual and parallel data for the two languages, whereas its interpretative models and main ideas are strongly based on the very tenets of cognitive linguistics. Through the analyses we gained insight about different features of the lexemes’ use: their morphosyntactic patterns, affective value, contextual traits and underlying mental concepts, which all contribute to the interpretation of the different meanings they can realise in the two languages. In the last part we focus on a single lexical pair, bitter and gorak, and analyse them using the cognitive linguistic model based on the schematic concept, as well as by performing a statistical analysis of the data using multiple correspondence analysis and logistic regression. The analyses reveal a number of interesting tendencies in the use of the adjectives, highlighting both shared and language-specific characteristics.Tema ovog diplomskog rada je istraživanje značenjskih struktura pridjeva koji se odnose na osnovne okuse u švedskome i hrvatskome jeziku. Osnovne ideje i interpretativni modeli koji se upotrebljavaju u radu temelje se na postavkama kognitivne lingvistike, dok se samoj analizi pristupa kao poredbenom istraživanju jednojezične i paralelne korpusne građe. Cilj je istraživanja utvrditi i usporediti tendencije u uporabi leksema, pridajući pritom pažnju njihovim morfosintaktičkim obrascima, afektivnoj vrijednosti, kontekstualnim obilježjima i pozadinskim konceptima, što sve pridonosi tumačenju različitih značenja koja ti leksemi ostvaruju u dvama jezicima. U posljednjem dijelu analize usredotočujemo se samo na jedan par pridjeva, bitter i gorak, te rezultate korpusne analize interpretiramo pomoću kognitivnolingvističkog modela ustrojstva prema shemi i obrađujemo statističkim metodama, analizom višestruke korespondencije i logističkom regresijom. Istraživanjem se ukazuje na niz zanimljivih tendencija u uporabi analiziranih pridjeva, od kojih neke uočavamo u oba jezika, dok su neke specifične za švedski, odnosno hrvatski

    Discourse-Related Language Contrasts in English-Croatian Human and Machine Translation

    Get PDF
    We present an analysis of a number of coreference phenomena in English-Croatian human and machine translations. The aim is to shed light on the differences in the way these structurally different languages make use of discourse information and provide insights for discourse-aware machine translation system development. The phenomena are automatically identified in parallel data using annotation produced by parsers and word alignment tools, enabling us to pinpoint patterns of interest in both languages. We make the analysis more fine-grained by including three corpora pertaining to three different registers. In a second step, we create a test set with the challenging linguistic constructions and use it to evaluate the performance of three MT systems. We show that both SMT and NMT systems struggle with handling these discourse phenomena, even though NMT tends to perform somewhat better than SMT. By providing an overview of patterns frequently occurring in actual language use, as well as by pointing out the weaknesses of current MT systems that commonly mistranslate them, we hope to contribute to the effort of resolving the issue of discourse phenomena in MT applications

    Discourse-Related Language Contrasts in English-Croatian Human and Machine Translation

    No full text
    We present an analysis of a number of coreference phenomena in English-Croatian human and machine translations. The aim is to shed light on the differences in the way these structurally different languages make use of discourse information and provide insights for discourse-aware machine translation system development. The phenomena are automatically identified in parallel data using annotation produced by parsers and word alignment tools, enabling us to pinpoint patterns of interest in both languages. We make the analysis more fine-grained by including three corpora pertaining to three different registers. In a second step, we create a test set with the challenging linguistic constructions and use it to evaluate the performance of three MT systems. We show that both SMT and NMT systems struggle with handling these discourse phenomena, even though NMT tends to perform somewhat better than SMT. By providing an overview of patterns frequently occurring in actual language use, as well as by pointing out the weaknesses of current MT systems that commonly mistranslate them, we hope to contribute to the effort of resolving the issue of discourse phenomena in MT applications
    corecore